A kernel extension to handle missing data
نویسندگان
چکیده
An extension for univariate kernels that deals with missing values is proposed. These extended kernels are shown to be valid Mercer kernels and can adapt to many types of variables, such as categorical or continuous. The proposed kernels are tested against standard RBF kernels in a variety of benchmark problems showing different amounts of missing values and variable types. Our experimental results are very satisfactory, because they usually yield slight to much better improvements over those achieved with standard methods.
منابع مشابه
An Enhanced Approach to Handle Missing Values in Heterogeneous Dataset
Generally, data mining (sometimes called data or knowledge discovery, knowledge extraction, knowledge discovery) is the process of analyzing huge voluminous data from different perspectives and summarizing it into the useful information. Hence data quality is much important to get the high quality pattern as result. Quality decisions ought to be based on quality data. Data quality is affected b...
متن کاملHandling missing values in kernel methods with application to microbiology data
We discuss several approaches that make possible for kernel methods to deal with missing values. The first two are extended kernels able to handle missing values without data preprocessing methods. Another two methods are derived from a sophisticated multiple imputation technique involving logistic regression as local model learner. The performance of these approaches is compared using a binary...
متن کاملEnsemble Learning with Supervised Kernels
Kernel-based methods have outstanding performance on many machine learning and pattern recognition tasks. However, they are sensitive to kernel selection, they may have low tolerance to noise, and they can not deal with mixed-type or missing data. We propose to derive a novel kernel from an ensemble of decision trees. This leads to kernel methods that naturally handle noisy and heterogeneous da...
متن کاملAn Estimation of Missing Values by Modified Mixed Kernels
----In statistical practices, difficulties of missing data are universal. Several techniques are used to handle this dilemma of missing data. They include both old approaches, which require only a small amount of mathematical computations and new approaches, which require additional difficult computations that are ever easier for social work researchers to carry out the statistical programming ...
متن کاملInvestigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کامل